13 research outputs found
The Lattice Project: A Multi-model Grid Computing System
This thesis presents The Lattice Project, a system that combines multiple models of Grid computing. Grid computing is a paradigm for leveraging multiple distributed computational resources to solve fundamental scientific problems that require large amounts of computation. The system combines the traditional Service model of Grid computing with the Desktop model of Grid computing, and is thus capable of utilizing diverse resources such as institutional desktop computers, dedicated computing clusters, and machines volunteered by the general public to advance science. The production Grid system includes a fully-featured user interface, support for a large number of popular scientific applications, a robust Grid-level scheduler, and novel enhancements such as a Grid-wide file caching scheme. A substantial amount of scientific research has already been completed using The Lattice Project
Computational Methods to Advance Phylogenomic Workflows
Phylogenomics refers to the use of genome-scale data in phylogenetic analysis. There are several methods for acquiring genome-scale, phylogenetically-useful data from an organism that avoid sequencing the entire genome, thus reducing cost and effort, and enabling one to sequence many more individuals. In this dissertation we focus on one method in particular — RNA sequencing — and the concomitant use of assembled protein-coding transcripts in phylogeny reconstruction. Phylogenomic workflows involve tasks that are algorithmically and computationally demanding, in part due to the large amount of sequence data typically included in such analyses. This dissertation applies techniques from computer science to improve methodology and performance associated with phylogenomic workflow tasks such as sequence classification, transcript assembly, orthology determination, and phylogenetic analysis. While the majority of the methods developed in this dissertation can be applied to the analysis of diverse organismal groups, we primarily focus on the analysis of transcriptome data from Lepidoptera (moths and butterflies), generated as part of a collaboration known as “Leptree”
Recommended from our members
Linoleic acid participates in the response to ischemic brain injury through oxidized metabolites that regulate neurotransmission.
Linoleic acid (LA; 18:2 n-6), the most abundant polyunsaturated fatty acid in the US diet, is a precursor to oxidized metabolites that have unknown roles in the brain. Here, we show that oxidized LA-derived metabolites accumulate in several rat brain regions during CO2-induced ischemia and that LA-derived 13-hydroxyoctadecadienoic acid, but not LA, increase somatic paired-pulse facilitation in rat hippocampus by 80%, suggesting bioactivity. This study provides new evidence that LA participates in the response to ischemia-induced brain injury through oxidized metabolites that regulate neurotransmission. Targeting this pathway may be therapeutically relevant for ischemia-related conditions such as stroke
Data from: Pan-genome and phylogeny of Bacillus cereus sensu lato
Background: Bacillus cereus sensu lato (s. l.) is an ecologically diverse bacterial group of medical and agricultural significance. In this study, I use publicly available genomes to characterize the B. cereus s. l. pan-genome and perform the largest phylogenetic and population genetic analyses of this group to date in terms of the number of genes and taxa included. With these fundamental data in hand, I identify genes associated with particular phenotypic traits (i.e., "pan-GWAS" analysis), and quantify the degree to which taxa sharing common attributes are phylogenetically clustered.
Methods: A rapid k-mer based approach (Mash) was used to create reduced representations of selected Bacillus genomes, and a fast distance-based phylogenetic analysis of this data (FastME) was performed to determine which species should be included in B. cereus s. l. The complete genomes of eight B. cereus s. l. species were annotated de novo with Prokka, and these annotations were used by Roary to produce the B. cereus s. l. pan-genome. Scoary was used to associate gene presence and absence patterns with various phenotypes. The orthologous protein sequence clusters produced by Roary were filtered and used to build HaMStR databases of gene models that were used in turn to construct phylogenetic data matrices. Phylogenetic analyses used RAxML, DendroPy, ClonalFrameML, PAUP, and SplitsTree. Bayesian model-based population genetic analysis assigned taxa to clusters using hierBAPS. The genealogical sorting index was used to quantify the phylogenetic clustering of taxa sharing common attributes. Results: The B. cereus s. l. pan-genome currently consists of ≈60,000 genes, ≈600 of which are "core" (common to at least 99% of taxa sampled). Pan-GWAS analysis revealed genes associated with phenotypes such as isolation source, oxygen requirement, and ability to cause diseases such as anthrax or food poisoning. Extensive phylogenetic analyses using an unprecedented amount of data produced phylogenies that were largely concordant with each other and with previous studies. Phylogenetic support as measured by bootstrap probabilities increased markedly when all suitable pan-genome data was included in phylogenetic analyses, as opposed to when only core genes were used. Bayesian population genetic analysis recommended subdividing the three major clades of B. cereus s. l. into nine clusters. Taxa sharing common traits and species designations exhibited varying degrees of phylogenetic clustering
Data from: Pan-genome and phylogeny of Bacillus cereus sensu lato
Background: Bacillus cereus sensu lato (s. l.) is an ecologically diverse bacterial group of medical and agricultural significance. In this study, I use publicly available genomes to characterize the B. cereus s. l. pan-genome and perform the largest phylogenetic and population genetic analyses of this group to date in terms of the number of genes and taxa included. With these fundamental data in hand, I identify genes associated with particular phenotypic traits (i.e., "pan-GWAS" analysis), and quantify the degree to which taxa sharing common attributes are phylogenetically clustered.
Methods: A rapid k-mer based approach (Mash) was used to create reduced representations of selected Bacillus genomes, and a fast distance-based phylogenetic analysis of this data (FastME) was performed to determine which species should be included in B. cereus s. l. The complete genomes of eight B. cereus s. l. species were annotated de novo with Prokka, and these annotations were used by Roary to produce the B. cereus s. l. pan-genome. Scoary was used to associate gene presence and absence patterns with various phenotypes. The orthologous protein sequence clusters produced by Roary were filtered and used to build HaMStR databases of gene models that were used in turn to construct phylogenetic data matrices. Phylogenetic analyses used RAxML, DendroPy, ClonalFrameML, PAUP, and SplitsTree. Bayesian model-based population genetic analysis assigned taxa to clusters using hierBAPS. The genealogical sorting index was used to quantify the phylogenetic clustering of taxa sharing common attributes. Results: The B. cereus s. l. pan-genome currently consists of ≈60,000 genes, ≈600 of which are "core" (common to at least 99% of taxa sampled). Pan-GWAS analysis revealed genes associated with phenotypes such as isolation source, oxygen requirement, and ability to cause diseases such as anthrax or food poisoning. Extensive phylogenetic analyses using an unprecedented amount of data produced phylogenies that were largely concordant with each other and with previous studies. Phylogenetic support as measured by bootstrap probabilities increased markedly when all suitable pan-genome data was included in phylogenetic analyses, as opposed to when only core genes were used. Bayesian population genetic analysis recommended subdividing the three major clades of B. cereus s. l. into nine clusters. Taxa sharing common traits and species designations exhibited varying degrees of phylogenetic clustering
B. cereus sensu lato phylogenetic trees
Contains the Bacillus FastME tree, the B. cereus s. l. accessory binary tree produced by Roary, the RAxML maximum likelihood trees (ML_1–ML_8), and the PAUP maximum parsimony tree (MP_1)
B. cereus sensu lato data matrices
Contains the Bacillus Mash distance matrix, the B. cereus s. l. pan-genome binary data matrix, and the six B. cereus s. l. concatenated data matrices used in the study along with associated RAxML partition specifications
create_data_matrix Perl script
The create_data_matrix Perl script aligns orthologous protein groups produced by HaMStR, converts alignments to CDS equivalents, applies the consensus method, and concatenates individual gene alignments to produce the final data matrix
B. cereus sensu lato HaMStR databases
Contains the four B. cereus s. l. HaMStR databases used in this study. Each HaMStR database contains BLAST databases, FASTA files, protein and CDS alignments, and HMMs
Recommended from our members
Brain oxylipin concentrations following hypercapnia/ischemia: effects of brain dissection and dissection time[S]
PUFAs are precursors to bioactive oxylipin metabolites that increase in the brain following CO2-induced hypercapnia/ischemia. It is not known whether the brain-dissection process and its duration also alter these metabolites. We applied CO2 with or without head-focused microwave fixation for 2 min to evaluate the effects of CO2-induced asphyxiation, dissection, and dissection time on brain oxylipin concentrations. Compared with head-focused microwave fixation (control), CO2 followed by microwave fixation prior to dissection increased oxylipins derived from lipoxygenase (LOX), 15-hydroxyprostaglandin dehydrogenase (PGDH), cytochrome P450 (CYP), and soluble epoxide hydrolase (sEH) enzymatic pathways. This effect was enhanced when the duration of postmortem ischemia was prolonged by 6.4 min prior to microwave fixation. Brains dissected from rats subjected to CO2 without microwave fixation showed greater increases in LOX, PGDH, CYP and sEH metabolites compared with all other groups, as well as increased cyclooxygenase metabolites. In nonmicrowave-irradiated brains, sEH metabolites and one CYP metabolite correlated positively and negatively with dissection time, respectively. This study presents new evidence that the dissection process and its duration increase brain oxylipin concentrations, and that this is preventable by microwave fixation. When microwave fixation is not available, lipidomic studies should account for dissection time to reduce these artifacts